Bagging Does Not Always Decrease Mean Squared Error

نویسندگان

  • Andreas Buja
  • Werner Stuetzle
چکیده

Bagging is a device intended for reducing the prediction error of learning algorithms. In its simplest form, bagging draws bootstrap samples from the training sample, applies the learning algorithm to each bootstrap sample, and then averages the resulting prediction rules. Heuristically, the averaging process should reduce the variance component of the prediction error. This is supported by empirical evidence suggesting that bagging can indeed reduce prediction error and appears to be most effective for cart trees, which are highly unstable functions of the data. We study the effects of bagging for the simple class of U-statistics. While these do not describe cart trees, U-statistics have the advantage of admitting a complete and rigorous analysis. We find that bagging always inceases bias, but the effects on variance and mean squared error depend on the specifics of the U-statistic and its distribution. We also find a correspondence to order 1/N2 for bagging based on resampling with and without replacement, respectively. AT&T Labs–Research, 180 Park Ave, Florham Park, NJ 07932-0971; [email protected] Department of Statistics, University of Washington, Seattle, WA 98195-4322; [email protected]. Research partially supported by NSF grant DMS 9803226. This work was performed while the second author was on sabbatical leave at AT&T Labs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Bagging and Estimation in Multivariate Mixtures

Two bagging approaches, say 1 2 n-out-of-n without replacement (subagging) and n-out-of-n with replacement (bagging) have been applied in the problem of estimation of the parameters in a multivariate mixture model. It has been observed by Monte Carlo simulations and a real data example, that both bagging methods have improved the standard deviation of the maximum likelihood estimator of the mix...

متن کامل

Application of ensemble learning techniques to model the atmospheric concentration of SO2

In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...

متن کامل

Analyzing Bagging

Bagging is one of the most effective computationally intensive procedures to improve on unstable estimators or classifiers, useful especially for high dimensional data set problems. Here we formalize the notion of instability and derive theoretical results to analyze the variance reduction effect of bagging (or variants thereof) in mainly hard decision problems, which include estimation after t...

متن کامل

Pricing and hedging derivative securities with neural networks: Bayesian regularization, early stopping, and bagging

We study the effectiveness of cross validation, Bayesian regularization, early stopping, and bagging to mitigate overfitting and improving generalization for pricing and hedging derivative securities with daily S&P 500 index daily call options from January 1988 to December 1993. Our results indicate that Bayesian regularization can generate significantly smaller pricing and delta-hedging errors...

متن کامل

A Case Study on Bagging, Boosting, and Basic Ensembles of Neural Networks for OCR

W e study the effectiveness of three neural network ensembles in improving OCR performance: ( i ) Basic, (ii) Bagging, and (iii) Boosting. Three random character degradation models are introduced in training indivadual networks in order to reduce error correlation between individual networks and to improve the generalization ability of neural networks. We compare the recognition accuracies of t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000